21 research outputs found

    Acceleration Methods for Classic Convex Optimization Algorithms

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingeniería Informática. Fecha de lectura : 12-09-2017Most Machine Learning models are defined in terms of a convex optimization problem. Thus, developing algorithms to quickly solve such problems its of great interest to the field. We focus in this thesis on two of the most widely used models, the Lasso and Support Vector Machines. The former belongs to the family of regularization methods, and it was introduced in 1996 to perform both variable selection and regression at the same time. This is accomplished by adding a `1-regularization term to the least squares model, achieving interpretability and also a good generalization error. Support Vector Machines were originally formulated to solve a classification problem by finding the maximum-margin hyperplane, that is, the hyperplane which separates two sets of points and its at equal distance from both of them. SVMs were later extended to handle non-separable classes and non-linear classification problems, applying the kernel-trick. A first contribution of this work is to carefully analyze all the existing algorithms to solve both problems, describing not only the theory behind them but also pointing out possible advantages and disadvantages of each one. Although the Lasso and SVMs solve very different problems, we show in this thesis that they are both equivalent. Following a recent result by Jaggi, given an instance of one model we can construct an instance of the other having the same solution, and vice versa. This equivalence allows us to translate theoretical and practical results, such as algorithms, from one field to the other, that have been otherwise being developed independently. We will give in this thesis not only the theoretical result but also a practical application, that consists on solving the Lasso problem using the SMO algorithm, the state-of-the-art solver for non-linear SVMs. We also perform experiments comparing SMO to GLMNet, one of the most popular solvers for the Lasso. The results obtained show that SMO is competitive with GLMNet, and sometimes even faster. Furthermore, motivated by a recent trend where classical optimization methods are being re-discovered in improved forms and successfully applied to many problems, we have also analyzed two classical momentum-based methods: the Heavy Ball algorithm, introduced by Polyak in 1963 and Nesterov’s Accelerated Gradient, discovered by Nesterov in 1983. In this thesis we develop practical versions of Conjugate Gradient, which is essentially equivalent to the Heavy Ball method, and Nesterov’s Acceleration for the SMO algorithm. Experiments comparing the convergence of all the methods are also carried out. The results show that the proposed algorithms can achieve a faster convergence both in terms of iterations and execution time.La mayoría de modelos de Aprendizaje Automático se definen en términos de un problema de optimización convexo. Por tanto, desarrollar algoritmos para resolver rápidamente dichos problemas es de gran interés para este campo. En esta tesis nos centramos en dos de los modelos más usados, Lasso y Support Vector Machines. El primero pertenece a la familia de métodos de regularización, y fue introducido en 1996 para realizar selección de características y regresión al mismo tiempo. Esto se consigue añadiendo una penalización `1al modelo de mínimos cuadrados, obteniendo interpretabilidad y un buen error de generalización. Las Máquinas de Vectores de Soporte fueron formuladas originalmente para resolver un problema de clasificación buscando el hiper-plano de máximo margen, es decir, el hiper-plano que separa los dos conjuntos de puntos y está a la misma distancia de ambos. Las SVMs se han extendido posteriormente para manejar clases no separables y problemas de clasificación no lineales, mediante el uso de núcleos. Una primera contribución de este trabajo es analizar cuidadosamente los algoritmos existentes para resolver ambos problemas, describiendo no solo la teoría detrás de los mismos sino también mencionando las posibles ventajas y desventajas de cada uno. A pesar de que el Lasso y las SVMs resuelven problemas muy diferentes, en esta tesis demostramos que ambos son equivalentes. Continuando con un resultado reciente de Jaggi, dada una instancia de uno de los modelos podemos construir una instancia del otro que tiene la misma solución, y viceversa. Esta equivalencia nos permite trasladar resultados teóricos y prácticos, como por ejemplo algoritmos, de un campo al otro, que se han desarrollado de forma independiente. En esta tesis mostraremos no solo la equivalencia teórica sino también una aplicación práctica, que consiste en resolver el problema Lasso usando el algoritmo SMO, que es el estado del arte para la resolución de SVM no lineales. También realizamos experimentos comparando SMO a GLMNet, uno de los algoritmos más populares para resolver el Lasso. Los resultados obtenidos muestran que SMO es competitivo con GLMNet, y en ocasiones incluso más rápido. Además, motivado por una tendencia reciente donde métodos clásicos de optimización se están re- descubriendo y aplicando satisfactoriamente en muchos problemas, también hemos analizado dos métodos clásicos basados en “momento”: el algoritmo Heavy Ball, creado por Polyak en 1963 y el Gradiente Acelerado de Nesterov, descubierto por Nesterov en 1983. En esta tesis desarrollamos versiones prácticas de Gradiente Conjugado, que es equivalente a Heavy Ball, y Aceleración de Nesterov para el algortimo SMO. Además, también se realizan experimentos comparando todos los métodos. Los resultados muestran que los algoritmos propuestos a menudo convergen más rápido, tanto en términos de iteraciones como de tiempo de ejecución

    ν-SVM solutions of constrained lasso and elastic net

    Full text link
    Many important linear sparse models have at its core the Lasso problem, for which the GLMNet algorithm is often considered as the current state of the art. Recently M. Jaggi has observed that Constrained Lasso (CL) can be reduced to an SVM-like problem, for which the LIBSVM library provides very efficient algorithms. This suggests that it could also be used advantageously to solve CL. In this work we will refine Jaggi’s arguments to reduce CL as well as constrained Elastic Net to a Nearest Point Problem, which in turn can be rewritten as an appropriate ν-SVM problem solvable by LIBSVM. We will also show experimentally that the well-known LIBSVM library results in a faster convergence than GLMNet for small problems and also, if properly adapted, for larger ones. Screening is another ingredient to speed up solving Lasso. Shrinking can be seen as the simpler alternative of SVM to screening and we will discuss how it also may in some cases reduce the cost of an SVM-based CL solutionWith partial support from Spanish government grants TIN2013-42351-P, TIN2016-76406-P, TIN2015-70308-REDT and S2013/ICE-2845 CASI-CAM-CM; work also supported by project FACIL–Ayudas Fundación BBVA a Equipos de Investigación Científica 2016 and the UAM–ADIC Chair for Data Science and Machine Learning. The first author is also supported by the FPU–MEC grant AP-2012-5163. We gratefully acknowledge the use of the facilities of Centro de Computación Científica (CCC) at UAM and thank Red Eléctrica de España for kindly supplying wind energy dat

    Deep fisher discriminant analysis

    Full text link
    Fisher Discriminant Analysis’ linear nature and the usual eigen-analysis approach to its solution have limited the application of its underlying elegant idea. In this work we will take advantage of some recent partially equivalent formulations based on standard least squares regression to develop a simple Deep Neural Network (DNN) extension of Fisher’s analysis that greatly improves on its ability to cluster sample projections around their class means while keeping these apart. This is shown by the much better accuracies and g scores of class mean classifiers when applied to the features provided by simple DNN architectures than what can be achieved using Fisher’s linear onesWith partial support from Spain's grants TIN2013-42351- P, TIN2016-76406-P, TIN2015-70308-REDT and S2013/ICE-2845 CASI-CAMCM. Work supported also by project FACIL{Ayudas Fundaci on BBVA a Equipos de Investigación Científica 2016, the UAM{ADIC Chair for Data Science and Machine Learning and Instituto de Ingeniería del Conocimiento. The third author is also supported by the FPU{MEC grant AP-2012-5163. We gratefully acknowledge the use of the facilities of Centro de Computacón Científi ca (CCC) at UA

    How do tuna schools associate to dFADs? A study using echo-sounder buoys to identify global patterns

    Full text link
    Based on the data gathered by echo-sounder buoys attached to drifting Fish Aggregating Devices (dFADs) across tropical oceans, the current study applies a Machine Learning protocol to examine the temporal trends of tuna schools' association to drifting objects. Using a binary output, metrics typically used in the literature were adapted to account for the fact that the entire tuna aggregation under the dFAD was considered. The median time it took tuna to colonize the dFADs for the first time varied between 25 and 43 days, depending on the ocean, and the longest soak and colonization times were registered in the Pacific Ocean. The tuna schools' Continuous Residence Times were generally shorter than Continuous Absence Times (median values between 5 and 7 days, and 9 and 11 days, respectively), in line with the results found by previous studies. Using a regression output, two novel metrics, namely aggregation time and disaggregation time, were estimated to obtain further insight into the symmetry of the aggregation process. Across all oceans, the time it took for the tuna aggregation to depart from the dFADs was not significantly longer than the time it took for the aggregation to form. The value of these results in the context of the "ecological trap" hypothesis is discussed, and further analyses to enrich and make use of this data source are proposed

    Tuna-AI: tuna biomass estimation with Machine Learning models trained on oceanography and echosounder FAD data

    Full text link
    Echo-sounder data registered by buoys attached to drifting FADs provide a very valuable source of information on populations of tuna and their behaviour. This value increases whenthese data are supplemented with oceanographic data coming from CMEMS. We use these sources to develop Tuna-AI, a Machine Learning model aimed at predicting tuna biomass under a given buoy, which uses a 3-day window of echo-sounder data to capture the daily spatio-temporal patterns characteristic of tuna schools. As the supervised signal for training, we employ more than 5000 set events with their corresponding tuna catch reported by the AGAC tuna purse seine fleet

    TUN-AI: Tuna biomass estimation with Machine Learning models trained on oceanography and echosounder FAD data

    Get PDF
    The use of dFADs by tuna purse-seine fisheries is widespread across oceans, and the echo-sounder buoys attached to these dFADs provide fishermen with estimates of tuna biomass aggregated to them. This information has potential for gaining insight into tuna behaviour and abundance, but has traditionally been difficult to process and use. The current study combines FAD logbook data, oceanographic data and echo-sounder buoy data to evaluate different Machine Learning models and establish a pipeline, named TUN-AI, for processing echo-sounder buoy data and estimating tuna biomass (in metric tons, t) at various levels of complexity: binary classification, ternary classification and regression. Models were trained and tested on over 5000 sets and over 6000 deployments. Of all the models evaluated, the best performing one uses a 3-day window of echo-sounder data, oceanographic data and position/time derived features. This model is able to estimate if tuna biomass was higher than 10 t or lower than 10 t with an F1-score of 0.925. When directly estimating tuna biomass, the best model (Gradient Boosting) has an error (MAE) of 21.6 t and a relative error (SMAPE) of 29.5%, when evaluated over sets. All models tested improved when enriched with oceanographic and position-derived features, highlighting the importance of these features when using echo-sounder buoy data. Potential applications of this methodology, and future improvements, are discussed.12 página
    corecore